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Abstract: We present a graph-based variational algorithm for multiclass classification of high-dimensional data, moti- 
vated by total variation techniques. The energy functional is based on a diffuse interface model with a periodic 
potential. We augment the model by introducing an alternative measure of smoothness that preserves sym- 
metry among the class labels. Through this modification of the standard Laplacian, we construct an efficient 
multiclass method that allows for sharp transitions between classes. The experimental results demonstrate that 
our approach is competitive with the state of the art among other graph-based algorithms. 



" 1 INTRODUCTION 

i—l 

Many tasks in pattern recognition and machine learn- 
ing rely on the ability to quantify local similarities in 
data, and to infer meaningful global structure from 
such local characteristics (Coif man et al., 2005| ). In 
the classification framework, the desired global struc- 
ture is a descriptive partition of the data into cate- 
gories or classes. Many studies have been devoted 
to the binary classification problems. The multiple- 
class case, where the data is partitioned into more than 
two clusters, is more challenging. One approach is 
to treat the problem as a series of binary classifica- 
tion problems ( All wein et al., 2000| . In this paper, we 
develop an alternative method, involving a multiple- 
class extension of the diffuse interface model intro- 
duced in (Bertozzi and F lenner, 2012| l. 

The diffuse interface model by Bertozzi and Flen- 
ner combines methods for diffusion on graphs with ef- 
ficient partial differential equation techniques to solve 
binary segmentation problems. As with other meth- 
ods inspired by physic al phenomena (|Bertozzi et al 



2007] |Junget al., 2057} |Li and Kim, 201 1) , it requires 



the minimization of an energy expression, specifi- 
cally the Ginzburg-Landau (GL) energy functional. 
The formulation generalizes the GL functional to the 
case of functions defined on graphs, and its minimiza- 
tion is related to the minimization of weighted graph 
cuts (Bert ozzi and Flenner, 2012| . In this sense, it par- 
allels other techniques based on inference on graphs 
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|Setzer, 201 1) . 

Multiclass segmentation methods that cast the 
problem as a series of binary classification problems 
use a number of different strategies: (i) deal di- 
rectly with some binary coding or indicator for the la- 



bels (Die tterich and Bakiri, 1995||Wang et al„ 2008| ), 
(ii) build a hierarchy or combination of classifiers 
based on the one-vs-all approach or on class rank- 



ings (Hastie and Tibshirani, 1998 Har-Peled et al 



2003 ) or (iii) apply a recursive partitioning scheme 



via diffusion operators or function estimation ( |Coif- 



consisting of successively subdividing clusters, until 
the desired number of classes is reached ( |Szlam and] 
|Bresson, 20"l0l |Hein and Setzer, 201 1) . While there 
are advantages to these approaches, such as possible 
robustness to mislabeled data, there can be a consid- 
erable number of classifiers to compute, and perfor- 
mance is affected by the number of classes to parti- 
tion. 

In contrast, we propose an extension of the diffuse 
interface model that obtains a simultaneous segmen- 
tation into multiple classes. The multiclass extension 
is built by modifying the GL energy functional to re- 
move the prejudicial effect that the order of the la- 
belings, given by integer values, has in the smoothing 
term of the original binary diffuse interface model. A 
new term that promotes homogenization in a multi- 
class setup is introduced. The expression penalizes 
data points that are located close in the graph but are 



not assigned to the same class. This penalty is ap- 
plied independently of how different the integer val- 
ues are, representing the class labels. In this way, the 
characteristics of the multiclass classification task are 
incorporated directly into the energy functional, with 
a measure of smoothness independent of label order, 
allowing us to obtain high-quality results. Alterna- 
tive multiclass methods minimize a Kullback-Leibler 
divergence function ( Subramanya and Bilmes, 201 1[ ) 
or expressions involving the discrete Laplace operator 
on graphs ( [Zhou et al., 20041 [Wang et al., 2008| >. 

This paper is organized as follows. Section pre- 
views the diffuse interface model for binary classifica- 
tion, and describes its application to semi-supervised 
learning. Section[3]discusses our proposed multiclass 
extension and the corresponding computational algo- 
rithm. Section |4] presents results obtained with our 
method. Finally, section[5]draws conclusions and de- 
lineates future work. 



2 DATA SEGMENTATION WITH 
THE GINZBURG-LANDAU 
MODEL 



The diffuse interface model ( |Bertozzi and Flenner,| 
2012) is based on a continuous approach, using the 
Ginzburg-Landau (GL) energy functional to measure 
the quality of data segmentation. A good segmenta- 
tion is characterized by a state with small energy. Let 
u{x) be a scalar field defined over a space of arbitrary 
dimensionality, and representing the state of the sys- 
tem. The GL energy is written as the functional 

E GL (u) = * l J\Vu\ 2 dx+~jF(u)dx, (1) 

with V denoting the spatial gradient operator, e > 
a real constant value, and F a double well potential 
with minima at ± 1 : 



F(u) 



1 



1 



(2) 



Segmentation requires minimizing the GL func- 
tional. The norm of the gradient is a smoothing term 
that penalizes variations in the field u. The potential 
term, on the other hand, compels u to adopt the dis- 
crete labels of +1 or —1, clustering the state of the 
system around two classes. Jointly minimizing these 
two terms pushes the system domain towards homo- 
geneous regions with values close to the minima of 
the double well potential, making the model appro- 
priate for binary segmentation. 

The smoothing term and potential term are in con- 
flict at the interface between the two regions, with the 



first term favoring a gradual transition, and the second 
term penalizing deviations from the discrete labels. A 
compromise between these conflicting goals is estab- 
lished via the constant e. A small value of e denotes a 
small length transition and a sharper interface, while 
a large e weights the gradient norm more, leading to 
a slower transition. The result is a diffuse interface 
between regions, with sharpness regulated by e. 

It can be shown that in the limit e — » this func- 
tion approximates the total variation (TV) formulation 



in the sense of functional (T) convergence ( Kohn and 



Sternberg, 1989 ), producing piecewise constant solu 
tions but with greater computational efficiency than 
conventional TV minimization methods. Thus, the 
diffuse interface model provides a framework to com- 
pute piecewise constant functions with diffuse tran- 
sitions, approaching the ideal of the TV formulation, 
but with the advantage that the smooth energy func- 
tional is more tractable numerically and can be mini- 
mized by simple numerical methods such as gradient 
descent. 

The GL energy has been used to approximate the 
TV norm for image segmentation ( Bertozzi and Flen-j 



ner, 20 1 2 1 and image inpainting (|Bertozzi et al., 2007 



Dobrosotskay a and Bertozzi, 2008[ l. Furthermore, a 
calculus on graphs equivalent to TV has been intro- 
duced in (Gilboa a nd Osher, 2008 \ |Szlam and Bres-| 
|son, 2010| l7~ 

Application of Diffuse Interface Models 
to Graphs 

An undirected, weighted neighborhood graph is used 
to represent the local relationships in the data set. This 
is a common technique to segment classes that are 
not linearly separable. In the A^-neighborhood graph 
model, each vertex zi E Z of the graph corresponds 
to a data point with feature vector x,, while the weight 
Wij is a measure of similarity between z, and zj- More- 
over, it satisfies the symmetry property wy = w,,-. The 
neighborhood is defined as the set of N closest points 
in the feature space. Accordingly, edges exist be- 
tween each vertex and the vertices of its Af -nearest 
neighbors. Following the approach of (Bertozzi and 



IFlenner, 2012b, we calculate weights using the local 



scaling of Zelnik-Manor and Perona (Zelnik-Manor 
|and Perona,~2005] l, 



Wj 



exp 



per 



z(xi) x(xj) 



(3) 



Here, x(x,) = ||jcj — xf || defines a local value for each 
Xi, where xf is the position of the Mth closest data 
point to xu and M is a global parameter. 



It is convenient to express calculations on graphs 
via the graph Laplacian matrix, denoted by L. The 
procedure we use to build the graph Laplacian is as 
follows. 

1. Compute the similarity matrix W with compo- 
nents Wij defined in d3). As the neighborhood re- 
lationship is not symmetric, the resulting matrix 
W is also not symmetric. Make it a symmetric ma- 
trix by connecting vertices Zi and Zj if Zi is among 
the A^-nearest neighbors of Zj or if Zj is among the 
Af-nearest neighbors of zi ( |von Luxburg, 2006 ). 

2. Define D as a diagonal matrix whose zth diago- 
nal element represents the degree of the vertex zu 
evaluated as 

di=Y^Wij. (4) 

3. Calculate the graph Laplacian: L = D — W. 

Generally, the graph Laplacian is normalized to guar- 
antee spectral convergence in the limit of large sample 
size (von Luxburg, 2006). The symmetric normalized 
graph Laplacian L s is defined as 

L s =D- l l 2 LD- X/2 =I-D- l/2 W D- 1 ' 2 . (5) 

Data segmentation can now be carried out through 
a graph-based formulation of the GL energy. To 
implement this task, a fidelity term is added to the 
functional as initially suggested in (Dobrosotskaya 
and Bertozzi, 2010). This enables the specification 



of a priori information in the system, for example 
the known labels of certain points in the data set. 
This kind of setup is called semi-supervised learning 
(SSL). The discrete GL energy for SSL on graphs can 
be written as ( |Bertozzi and Flenner, 2012) : 



E GL ssh 0) 



u,L s u) + - £ F{u( Zi )) 



Z/GZ 



+ 1 ^ (u(zi)-u (zi)) 2 (6) 

Z/GZ 

In the discrete formulation, u is a vector whose com- 
ponent u(zj) represents the state of the vertex zu £ > 
is a real constant characterizing the smoothness of 
the transition between classes, and X(zi) is a fidelity 
weight taking value X > if the label Ko(z;) (i.e. class) 
of the data point associated with vertex z, is known 
beforehand, or X(zi) = if it is not known (semi- 
supervised). 

Equation |6]) may be understood as an example of 
the more general form of an energy functional for data 
classification, 



where the norm \\u\\ a is a regularization term and 
\\ u ~ f\\b is a fidelity term. The choice of the reg- 
ularization norm || • \ \ a has non-trivial consequences 
in the final classification accuracy. Attractive quali- 
ties of the norm || • || a include allowing classes to be 
close in a metric space, and obtain segmentations for 
nonlinearly separable data. Both of these goals are 
addressed using the GL energy functional for SSL. 

Minimizing the functional simulates a diffusion 
process on the graph. The information of the few la- 
bels known is propagated through the discrete struc- 
ture by means of the smoothing term, while the po- 
tential term clusters the vertices around the states ±1 
and the fidelity term enforces the known labels. The 
energy minimization process itself attempts to reduce 
the interface regions. Note that in the absence of the 
fidelity term, the process could lead to a trivial steady- 
state solution of the diffusion equation, with all data 
points assigned the same label. 

The final state u(zi) of each vertex is obtained by 
thresholding, and the resulting homogeneous regions 
with labels of +1 and —1 constitute the two-class data 
segmentation. 



3 MULTICLASS EXTENSION 

The double-well potential in the diffuse interface 
model for SSL flows the state of the system towards 
two definite labels. Multiple-class segmentation re- 
quires a more general potential function F(u) that al- 
lows clusters around more than two labels. For this 
purpose, we use the periodic-well potential suggested 



by Li and Kim (Li and Kim, 201 1 1 



F{u)= l -{uf{{u}-\)\ 



where {u} denotes the fractional part of u, 

{u} — u — [u\, 



(8) 



(9) 



E{u) 



— a- 
2" 



At 



(7) 



and \ u\ is the largest integer not greater than u. 

This periodic potential well promotes a multiclass 
solution, but the graph Laplacian term in Equation |6) 
also requires modification for effective calculations 
due to the fixed ordering of class labels in the multi- 
ple class setting. The graph Laplacian term penalizes 
large changes in the spatial distribution of the system 
state more than smaller gradual changes. In a multi- 
class framework, this implies that the penalty for two 
spatially contiguous classes with different labels may 
vary according to the (arbitrary) ordering of the la- 
bels. 

This phenomenon is shown in Figure [T] Sup- 
pose that the goal is to segment the image into three 



classes: class composed by the black region, class 
1 composed by the gray region and class 2 composed 
by the white region. It is clear that the horizontal in- 
terfaces comprise a jump of size 1 (analogous to a 
two class segmentation) while the vertical interface 
implies a jump of size 2. Accordingly, the smoothing 
term will assign a higher cost to the vertical interface, 
even though from the point of view of the classifica- 
tion, there is no specific reason for this. In this ex- 
ample, the problem cannot be solved with a different 
label assignment. There will always be an interface 
with higher costs than others independent of the inte- 
ger values used. 

Thus, the multiclass approach breaks the sym- 
metry among classes, influencing the diffuse inter- 
face evolution in an undesirable manner. Eliminating 
this inconvenience requires restoring the symmetry, 
so that the difference between two classes is always 
the same, regardless of their labels. This objective is 
achieved by introducing a new class difference mea- 
sure. 



Figure 1: Three class segmentation. Black: class 0. Gray: 
class 1. White: class 2. 



3.1 Generalized Difference Function 

The final class labels are determined by thresholding 
each vertex u{zi), with the label yi set to the nearest 
integer: 

t \ 1 
«(Zi) + o 



yi 



(10) 



The boundaries between classes then occur at 
half-integer values corresponding to the unstable 
equilibrium states of the potential well. Define the 
function r(x) to represent the distance to the nearest 
half-integer: 

r(x) = !-{*} . (11) 

A schematic of fix) is depicted in Figure [2] The 
r(x) function is used to define a generalized differ- 
ence function between classes that restores symmetry 
in the energy functional. Define the generalized dif- 
ference function p as: 



p(u(zi),u(zj))- 



r{u(zi))+f(u(zj)) 



y-^yj 



\f{u{zi)) - f{u{zj))\ yi=yj 

(12) 



Half-integer 




Integer 

Figure 2: Schematic interpretation of generalized differ- 
ence: f(x) measures distance to nearest half-integer, and p 
then corresponds to distance on tree. 



Thus, if the vertices are in different classes, the 
difference f{x) between each state's value and the 
nearest half-integer is added, whereas if they are in 
the same class, these differences are subtracted. The 
function p(x,y) corresponds to the tree distance (see 
Fig. [2}. Strictly speaking, p is not a metric since it 
does not satisfy p (x, y) — => x — y. Nevertheless, the 
cost of interfaces between classes becomes the same 
regardless of class labeling when this generalized dis- 
tance function is implemented. 

The GL energy functional for SSL, using the new 
generalized difference function p, is expressed as 



L L ^[P("fe),"fe))] 2 



1 



28 



£{ M (z ; -)} 2 ({«(zO}-i) 2 



z,-ez 



+ L ^ ( M fc)-"o(z,)) 2 - (13) 

Z;6Z L 

Note that p could also be used in the fidelity term, 
but for simplicity this modification is not included. In 
practice, this has little effect on the results. 

3.2 Computational Algorithm 

The GL energy functional given by ( [D) may be min- 
imized iteratively, using gradient descent: 



, m+l 



dt 



8E, 



MGL SSL 
§((,■ 



(14) 



where m,- is a shorthand for u(zi), dt represents the 
time step and the gradient direction is given by: 



5E MGLssL = + 1 + ^ _ (15) 

OM; e 



G ( u ?) = L " 



>(u?)±?(uj)}?(ur) (16) 



F'(«f) = 2K} 3 -3K} 2 + K} (17) 



Algorithm 1 Calculate u 

Require: e > 0,dt > 0,m max > 0,K given 
Ensure: out = u'" mi,x 

u° <- rand((0,K)) — |, m -S— 

for w < m max do 
z<-0 

for i < r do 

«- M ™ - * (e G(uf ) + i F'(uf)+Xi (uf - u,o)) 

if Label(w™ +1 ) ^ Label(w™) then 
(v,)^fc + K +1 } 

u" 1+l <- (v ( -)jfe where = argminix^A: ^=4= [p((v/)*,i</) 

end if 

i <- / + 1 
end for 
m <— m + 1 
end for 



The gradient of the generalized difference func- 
tion p is not defined at half integer values. Hence, 
we modify the method using a greedy strategy: after 
detecting that a vertex changes class, the new class 
that minimizes the smoothing term is selected, and 
the fractional part of the state computed by the gra- 
dient descent update is preserved. Consequently, the 
new state of vertex i is the result of gradient descent, 
but if this causes a change in class, then a new state is 
determined. 

Specifically, let k represent an integer in the range 
of the problem, i.e. k g [0, A'— 1], where K is the num- 
ber of classes in the problem. Given the fractional 
part {«} resulting from the gradient descent update, 
define (v/)t = k + {uj}. Find the integer k that mini- 
mizes Ytj lP(( v ')k, u j)] i the smoothing term in 

the energy functional, and use (v;)t as the new vertex 
state. A summary of the procedure is shown in Algo- 
rithm [T] with m max denoting the maximum number of 
iterations. 



4 RESULTS 

The performance of the multiclass diffuse interface 
model is evaluated using a number of data sets from 
the literature, with differing characteristics. Data and 
image segmentation problems are considered on syn- 
thetic and real data sets. 

4.1 Synthetic Data 

A synthetic three-class segmentation problem is 
constructed following an analogous procedure used 



in ( jBiihler and Hein, 2009) > for "two moon" bi- 
nary classification, using three half circles ("three 
moons"). The half circles are generated in R 2 . The 
two top circles have radius 1 and are centered at (0,0) 
and (3,0). The bottom half circle has radius 1.5 
and is centered at (1.5,0.4). We sample 1500 data 
points (500 from each of these half circles) and em- 
bed them in R lQ0 . The embedding is completed by 
adding Gaussian noise with a 2 = 0.02 to each of the 
100 components for each data point. The dimension- 
ality of the data set, together with the noise, make this 
a nontrivial problem. 

The difficulty of the problem is illustrated in Fig- 
ure^ where we use both spectral clustering decompo- 
sition and the multiclass GL method. The same graph 
structure is used for both methods. The symmetric 
graph Laplacian is computed based on edge weights 
given by Q, using jV = 10 nearest neighbors and lo- 
cal scaling based on the M = 10 closest point. The 
spectral clustering results are obtained by applying a 
fe-means algorithm to the first 3 eigenvectors of the 
symmetric graph Laplacian. The average error ob- 
tained, over 100 executions of spectral clustering, is 
20% (±0.6%). The figure displays the best result ob- 
tained, corresponding to an error of 18.67%. 

The multiclass GL method was implemented with 
the following parameters: interface scale 6=1, step 
size dt = 0.01 and number of iterations m max = 800. 
The fidelity term is determined by labeling 25 points 
randomly selected from each class (5% of all points), 
and setting the fidelity weight to X = 30 for those 
points. Several runs of the procedure are performed 
to isolate effects from the random initialization and 
the arbitrary selection of fidelity points. The aver- 
age error obtained, over 100 runs with four differ- 







A 



Figure 3: Three-class segmentation. Left: spectral clustering. Right: multiclass GL (adaptive e). 



ent fidelity sets, is 5.2% (±1.01%). In general terms, 
the system evolves from an initially inhomogeneous 
state, rapidly developing small islands around fidelity 
points that become seeds for homogeneous regions 
and progressing to a configuration of classes forming 
nearly uniform clusters. 

The multiclass results were further improved by 
incrementally decreasing e to allow sharper transi- 
tions between states as in (jBertozzi an d Flenner,| 
2012). With this approach, the average error obtained 
over 100 runs is reduced to 2.6% (±0.3%). The best 
result obtained in these runs is displayed in Figure [3] 
and corresponds to an average error of 2.13%. In 
these runs, e is reduced from £o = 2 to Ef — 0.1 in 
decrements of 10%, with 40 iterations performed per 
step. The average computing time per run in this 
adaptive technique is 1.53s in an Intel Quad-Core @ 
2.4 GHz, without any parallel processing. 

For comparison, we note the results from the liter- 
ature for the simpler two moon problem (R 100 , a 2 = 
0.02 noise). The best errors reported include: 6% for 
p-Laplacian (Bu hler and Hein, 2009}, 4.6% for ratio 



minimization relaxed Cheeger cut (Szlam and Bres 



son, 2010]>, and 2.3% for binary GL (Bertozzi and 



Flenner, 2012). While these are not SSL methods the 
last of these does involve other prior information in 
the form of a mass balance constraint. It can be seen 
that both of our procedures, fixed and adaptive e, pro- 
duce high-quality results even for the more complex 
three-class segmentation problem. Calculation times 
are also competitive with those reported for the binary 
case (0.5s - 50s). 

4.2 Image Segmentation 

As another test setup, we use a grayscale image of 
size 191 x 196, taken from ( |Jung et al., 2007| |Li and] 
Kim, 20l"T| ) and composed of 5 classes: black, dark 
gray, medium gray, light gray and white. This image 



contains structure, such as an internal hole and junc- 
tions where multiple classes meet. The image infor- 
mation is represented through feature vectors defined 
as (xi,yi,p\x ( ), with x\ and yi corresponding to (x,y) 
coordinates of the pixel and pix ; equal to the inten- 
sity of the pixel. All of these are normalized so as to 
obtain values in the range [0, 1]. 

The graph is constructed using N = 30 nearest 
neighbors and local scaling based on the M = 30 clos- 
est point. We use parameters e = 1, dt = 0.01 and 
'«max = 800. We then choose 1500 random points (4% 
of the total) for the fidelity term, with X = 30. Fig- 
ure [4] displays the original image with the randomly 
selected fidelity points (top left), and the five-class 
segmentation. Each class image shows in white the 
pixels identified as belonging to the class, and in black 
the pixels of the other classes. In this case, all the 
classes are segmented perfectly with an average run 
time of 59.7s. The method of Li and Kim ( |Li and| 
Kim, 2011 1 also segments this image perfectly, with 
a reported run time of 0.625s. However, their ap- 
proach uses additional information, including a pre- 
assignment of specific grayscale levels to classes, and 
the overall densities of each class. Our approach does 
not require these. 

4.3 MNIST Data 

The MNIST data set available at 
http://yann.lecun.com/exdb/mnist/ is composed 
of 70,000 images of size 28 x 28, corresponding to a 
broad sample of handwritten digits through 9. We 
use the multiclass diffuse interface model to segment 
the data set automatically into 10 classes, one per 
handwritten digit. Before constructing the graph, we 
preprocess the data by normalizing and projecting 
into 50 principal components, following the approach 



in ( |Szlam and Bresson72 010). No further steps, such 
as smoothing convolutions, are required. The graph 




Figure 4: Image Segmentation Results. Top left: Original five-class image, with randomly chosen fidelity points displayed. 
Other panels: the five segmented classes, shown in white. 



is computed with N = 10 nearest neighbors and local 
scaling based on the M — 10 closest points. 



An adaptive e variant of the algorithm is imple- 
mented, with parameters £o = 2, £f = 0.01, £ decre- 
ment 10%, dt — 0.01, and 40 iterations per step. For 
the fidelity term, 7,000 images (10% of total) are cho- 
sen, with weight A, = 30. The average error obtained, 
over 20 runs with four different fidelity sets, is 7% 
(±0.072%). The confusion matrix for the best result 
obtained, corresponding to a 6.86% error, is given in 
Table [T] each row represents the segmentation ob- 
tained, while the columns represent the true digit la- 
bels. For reference, the average computing time per 
run in this adaptive technique is 132s. Note that, in 
the segmentations, the largest mistakes made are in 
trying to distinguish digits 4 from 9 and 7 from 9. 



For comparison, errors reported using unsu- 
pervised clustering algorithms in the literature 
are: 12.9% for p-Laplacian (Btih ler and HeinT] 
2009 ), 1 1 .8% for ratio-minim ization relaxed Cheeger 
cut HSzlam and Bresson, 2010| >, and 12.36% for the 
multicut version of the normalized 1-cut ( |Hein and] 
Setzer, 2011 1. A more sophisticated graph-based 



diffusion method applied in a semi-supervised setup 
(transductive classification), with function-adapted 
eigenfunctions, a graph constructed with 13 neigh- 
bors, and self-tuning with the 9th neighbor reported 
in ( |Szlam et al., 20 08) obtains an error of 7.4%. Re- 



sults with similar errors are reported in ( |Liu et aL 



20101. Thus, the performance of the multiclass GL 



on this data set improves upon other published results, 
while requiring less preprocessing and a simpler reg- 
ularization of the functions on the graph. 



5 CONCLUSIONS 

We have proposed a new multiclass segmentation pro- 
cedure, based on the diffuse interface model. The 
method obtains segmentations of several classes si- 
multaneously without using one-vs-all or alterna- 
tive sequences of binary segmentations required by 
other multiclass methods. The local scaling method 
of Zelnik-Manor and Perona, used to construct the 
graph, constitutes a useful representation of the char- 
acteristics of the data set and is adequate to deal with 
high-dimensional data. 

Our modified diffusion method, represented by 
the non-linear smoothing term introduced in the 
Ginzburg-Landau functional, exploits the structure of 
the multiclass model and is not affected by the or- 
dering of class labels. It efficiently propagates class 
information that is known beforehand, as evidenced 
by the small proportion of fidelity points (4% - 10% 
of dataset) needed to perform accurate segmentations. 
Moreover, the method is robust to initial conditions. 
As long as the initialization represents all classes uni- 
formly, different initial random configurations pro- 
duce very similar results. The main limitation of 
the method appears to be that fidelity points must be 
representative of class distribution. As long as this 
holds, such as in the examples discussed, the long- 
time behavior of the solution relies less on choosing 
the "right" initial conditions than do other learning 
techniques on graphs. 

State-of-the-art results with small classification 
errors were obtained for all classification tasks. Fur- 
thermore, the results do not depend on the particular 
class label assignments. Future work includes inves- 
tigating the diffuse interface parameter e. We con- 
jecture that the proposed functional converges (in the 
T-convergence sense) to a total variational type func- 



Table 1 : Confusion Matrix for the MNIST Data Segmentation. 
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tional on graphs as e approaches zero, but the exact 
nature of the limiting functional is unknown. 
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